翻訳と辞書
Words near each other
・ Iris Origo
・ IRIS engine
・ Iris ensata
・ Iris Estabrook
・ Iris Evans
・ Iris Faircloth Blitch
・ Iris Falcam
・ Iris falcifolia
・ Iris Falcón
・ Iris Falls
・ Iris farreri
・ Iris fernaldii
・ Iris filifolia
・ Iris flavescens
・ Iris florentina
Iris flower data set
・ Iris foetidissima
・ Iris folding
・ Iris Fontbona
・ Iris formosana
・ Iris forrestii
・ Iris fosteriana
・ Iris Fuentes-Pila
・ Iris fulva
・ Iris furcata
・ Iris germanica
・ Iris giganticaerulea
・ IRIS GL
・ Iris glaucescens
・ Iris goniocarpa


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Iris flower data set : ウィキペディア英語版
Iris flower data set

The ''Iris'' flower data set or Fisher's ''Iris'' data set is a multivariate data set introduced by Ronald Fisher in his 1936 paper ''The use of multiple measurements in taxonomic problems'' as an example of linear discriminant analysis. It is sometimes called Anderson's ''Iris'' data set because Edgar Anderson collected the data to quantify the morphologic variation of ''Iris'' flowers of three related species. Two of the three species were collected in the Gaspé Peninsula "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus".
The data set consists of 50 samples from each of three species of ''Iris'' (''Iris setosa'', ''Iris virginica'' and ''Iris versicolor''). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish the species from each other.
==Use of the data set==

Based on Fisher's linear discriminant model, this data set became a typical test case for many statistical classification techniques in machine learning such as support vector machines.
The use of this data set in cluster analysis however is uncommon, since the data set only contains two clusters with rather obvious separation. One of the clusters contains ''Iris setosa'', while the other cluster contains both ''Iris virginica'' and ''Iris versicolor'' and is not separable without the species information Fisher used. This makes the data set a good example to explain the difference between supervised and unsupervised techniques in data mining: Fisher's linear discriminant model can only be obtained when the object species are known: class labels and clusters are not necessarily the same.
Nevertheless, all three species of ''Iris'' are separable in the projection on the nonlinear branching principal component.〔A.N. Gorban, N.R. Sumner, and A.Y. Zinovyev, (Topological grammars for data approximation ), Applied Mathematics Letters Volume 20, Issue 4 (2007), 382-386.〕 The data set is approximated by the closest tree with some penalty for the excessive number of nodes, bending and stretching. Then the so-called "metro map" is constructed.〔 The data points are projected into the closest node. For each node the pie diagram of the projected points is prepared. The area of the pie is proportional to the number of the projected points. It is clear from the diagram (left) that the absolute majority of the samples of the different ''Iris'' species belong to the different nodes. Only a small fraction of ''Iris-virginica'' is mixed with ''Iris-versicolor'' (the mixed blue-green nodes in the diagram). Therefore, the three species of Iris (Iris setosa, ''Iris virginica'' and ''Iris versicolor'') are separable by the unsupervising procedures of nonlinear principal component analysis. To discriminate them, it is sufficient just to select the corresponding nodes on the principal tree.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Iris flower data set」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.